A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies
نویسندگان
چکیده
The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekine’s Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of
منابع مشابه
Identification of Ontological Relations Using Formal Concept Analysis
In this paper we present an approach for the automatic identification of relations in ontologies of restricted domain. We use the evidence found in a corpus associated to the same domain of the ontology for determining the validity of the ontological relations. Our approach employs formal concept analysis, a method used for the analysis of data, but in this case used for relations discovery in ...
متن کاملBuilding frame-based corpus on the basis of ontological domain knowledge
Semantic Role Labeling (SRL) plays a key role in many NLP applications. The development of SRL systems for the biomedical domain is frustrated by the lack of large domainspecific corpora that are labeled with semantic roles. Corpus development has been very expensive and time-consuming. In this paper we propose a method for building frame-based corpus on the basis of domain knowledge provided b...
متن کاملبررسی هستان شناسی های توسعه یافته مبتنی بر اصول هستان شناسی های منبع باز زیست پزشکی
Background and Aim: Ontologies facilitate data integration, exchange, searching and querying. Open Biomedical Ontologies (OBO) Foundry is a solution for creating reference ontologies. In this foundry, the design of ontologies is based on established principles which allow for their interactions as a single system. The purpose of this study is to determine the main features of ontologies develop...
متن کاملDeploying Semantic Resources for Open Domain Question Answering
This thesis investigates how semantic resources can be deployed to improve the accuracy of an open domain question answering (QA) system. In particular, two types of semantic resources have been utilized to answer factoid questions: (1) Semantic parsing techniques are applied to analyze questions for semantic structures and to find phrases in the knowledge source that match these structures. (2...
متن کاملDomain-Specific Ontology Mapping by Corpus-Based Semantic Similarity
Mapping heterogeneous ontologies is usually performed manually by domain experts, or accomplished by computer programs via comparing the structures of the ontologies and the linguistic semantics of their concepts. In this work, we take a different approach to compare and map the concepts of heterogeneous domain-specific ontologies by using a document corpus in a domain similar to the domain of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009